Morphoanalysis of Spanish Texts: Two Applications for Web Pages
نویسندگان
چکیده
The applications described here follow up the works performed in the recent last years by the Data Structures and Computational Linguistics Group at Las Palmas de Gran Canaria University. These works have been developed about computational Linguistics and, as one of their results, some tools for morphologic identification and generation have been released. This work presents the use of those tools as parts of new applications designed to benefit from the great linguistic information flow from Internet. Two kinds of applications are identified, both according to the interactive grade of the linguistics studies to be done, and two prototypes, named DAWeb and NAWeb, are developed with special attention to their architecture in order to maximize the efficiency of both. Analysis modes include: neologism detection, word use (qualitative and quantitative measurements) and some syntax aspects like lexical collocations or prepositional regimes.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملبررسی ارتباط بین کیفیت اطلاعات و شاخص های ظاهری در صفحات وب فارسی مرتبط با حوزه سلامت عمومی
Introduction: One approach to evaluate the quality of a web page is to investigate its external markers. The purpose of the present study is to determine the relationship between information quality of Persian public health web pages and their external quality. Methods: The samples of this correlation study were selected from among the freely available ten-key word texts of chronic diseases...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملBabylon Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages
This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gathered automatically from the Web. In addition to the identification of entire Web pages, we also propose a new feature specifically designed to find parallel text chunks within a single document. Experiments carried out o...
متن کاملبهینهسازی اجرا و پاسخ صفحات وب در فضای ابری با روشهای پیشپردازش، مطالعه موردی سامانههای وارنیش و انجینکس
The response speed of Web pages is one of the necessities of information technology. In recent years, renowned companies such as Google and computer scientists focused on speeding up the web. Achievements such as Google Pagespeed, Nginx and varnish are the result of these researches. In Customer to Customer(C2C) business systems, such as chat systems, and in Business to Customer(B2C) systems, s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003